🕷️️ Job Radar • SCRAPING

Job Radar. Live notifications. AI processed.

freelancer.com 2026-04-16 🟠

🔹 OCR Blood Test Extractor
👤 Client: 🇮🇳 Ahmedabad, India Member since 2014-12-07
💰 Price: $81 Average bid
🚩 Problem: Extract numerical results from multi-page PDF blood test reports with varying layouts.
📦 Existing: Not specified

Specifications:

[Target] Extract and structure key test result data (values, reference ranges, units) from PDFs of blood tests.
[Method] Develop a .NET Core application using Tesseract OCR for text recognition. Consider AWS Textract as an alternative if needed.
[UI/UX] No user interface required; tool will be command-line based or desktop executable.
[Stack] .NET Core, Tesseract (or AWS Textract), PDF parsing libraries (iTextSharp or PdfBox).
[Security] Ensure data privacy and security during processing. Use secure storage for any temporary files.
[Format] Output structured data in CSV or JSON format.

Workflow:

1. Define a schema for the expected fields and their mappings to test names.
2. Develop OCR functionality using Tesseract (or AWS Textract) to read text from PDFs.
3. Parse extracted text to identify and map numerical results, reference ranges, and units to specific tests.
4. Handle variations in layout by implementing robust pattern recognition and fallback mechanisms.
5. Validate the accuracy of extracted data through spot-checking against original documents.
6. Consolidate all extracted data into a single CSV or JSON file for easy consumption.

⚡ Receive notifications instantly Join our community.